Message Passing On Communication-Exposed Multi-Core Processors

نویسندگان

  • James Psota
  • Anant Agarwal
چکیده

Next-generationmicroprocessorswill increasingly rely onparallelism, as opposed to frequency scaling, for improvements in performance scalability. Microprocessor designers are attaining such parallelism by placing multiple processing cores on a single silicon die. Current commercial multi-core processors such as the POWER and AMD Opteron  force inter-processor communication to go through the memory system. However, some multi-core processors such as the MIT Raw processor o er rst-class network support that exposes network resources at the ISA level, providing opportunities to interact with hardware resources in novel ways. ˆis paper presents rMPI, which leverages the on-chip network of multi-core processors to build an abstraction with whichmany programmers are familiar: the MPI programming interface. ˆis study uses the MIT Raw processor as an experimentation and validation vehicle, although the ndings presented are applicable tomulti-core processors with on-chip networks in general. Likewise, this study uses theMPIAPI as a general interface which allows parallel tasks to communicate, but the results shown in this paper are generally applicable to message passing communication. Overall, rMPI’s design constitutes themarriage ofmessage passing communication andonchip networks, allowing programmers to employ a well-understood programming model to a novel high performance multi-core processor architecture. rMPI o ers the following features: robust, deadlockfree, and scalable programming mechanisms; an interface that is compatible with current MPI so ware; an easy interface for programmers already familiarwith high-levelmessage passing paradigms; and ne-grain control over their programs when automatic parallelization tools do not yield su cient performance. ˆis paper compares rMPI running on such a multi-core processor to hand-coded applications running on one of the processor’s lowlevel on-chip networks. rMPI is also compared to a commercialqualityMPI implementation running on a cluster of Ethernet-connected workstations. ˆis paper evaluates various performancemetrics such as latency, bandwidth, and performance scalability on a number of kernel benchmarks and applications. Results show that rMPI provides speedups of x to x for  processor cores, depending on the application, which equal or exceed performance scalability of the MPI cluster system. Furthermore, rMPI achieves overhead as low as  for real applications relative to hand-coded applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fault Observant Real-Time Embedded Design for Network-on-Chip Control Systems

Performance and time to market requirements cause many realtime designers to consider components, off the shelf (COTS) for real-time systems. Massive multi-core embedded processors with network-on-chip (NoC) designs to facilitate core-to-core communication are becoming common in COTS. These architectures benefit real-time scheduling, but they also pose predictability challenges. In this work, w...

متن کامل

Highly Efficient and Predictable Group Communication over Multi-core NoCs

Massive multi-core embedded processors with network-on-chip (NoC) are becoming common in real-time systems. These architectures benefit real-time scheduling of tasks and provide higher processing capability due to abundance of cores. The core-to-core communication can be leveraged by adopting message passing to further increase system scalability. Despite these advantages, multicores pose predi...

متن کامل

Parallelization of K-Means Clustering on Multi-Core Processors

Multi-core processors have recently been available on most personal computers. To get the maximum benefit of computational power from the multi-core architecture, we need a new design on existing algorithms and software. In this paper we propose the parallelization of the well-known k-means clustering algorithm. We employ a single program multiple data (SPMD) approach based on a message passing...

متن کامل

Efficient and Predictable Group Communication for Manycore NoCs

Massive manycore embedded processors with network-on-chip (NoC) architectures are becoming common. These architectures provide higher processing capability due to an abundance of cores. They provide native core-to-core communication that can be exploited via message passing to provide system scalability. Despite these advantages, manycores pose predictability challenges that can affect both per...

متن کامل

Mixed-mode implementation of PETSc for scalable linear algebra on multi-core processors

With multi-core processors a ubiquitous building block of modern supercomputers, it is now past time to enable applications to embrace these developments in processor design. To achieve exascale performance, applications will need ways of exploiting the new levels of parallelism that are exposed in modern high-performance computers. A typical approach to this is to use shared-memory programming...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006